Spotify’s Genre Determination

Author

Joshua Chang

Published

March 15, 2023

Introduction

Using the Spotify dataset provided on Kaggle by Andrii Samoshyn, I will be examining the determination of genre using the given audio feature variables that Spotify uses. As Spotify is widely and globally used, I want to observe and analysis how Spotify is able to determine and assign a genre to a song that is then used to provide recommendations to users. There are over 40000 observations with most being audio feature variables, such as danceability and acousticness.

Data Overview

Data summary
Name spotify
Number of rows 42305
Number of columns 22
_______________________
Column type frequency:
character 8
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
type 0 1.00 14 14 0 1 0
id 0 1.00 22 22 0 35877 0
uri 0 1.00 36 36 0 35877 0
track_href 0 1.00 56 56 0 35877 0
analysis_url 0 1.00 64 64 0 35877 0
genre 0 1.00 3 15 0 15 0
song_name 20786 0.51 1 138 0 15439 0
title 21525 0.49 4 49 0 132 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
danceability 0 1.00 0.64 0.16 0.07 0.52 0.65 0.77 0.99 ▁▂▇▇▃
energy 0 1.00 0.76 0.18 0.00 0.63 0.80 0.92 1.00 ▁▁▃▅▇
key 0 1.00 5.37 3.67 0.00 1.00 6.00 9.00 11.00 ▇▂▃▅▆
loudness 0 1.00 -6.47 2.94 -33.36 -8.16 -6.23 -4.51 3.15 ▁▁▁▇▂
mode 0 1.00 0.55 0.50 0.00 0.00 1.00 1.00 1.00 ▆▁▁▁▇
speechiness 0 1.00 0.14 0.13 0.02 0.05 0.08 0.19 0.95 ▇▂▁▁▁
acousticness 0 1.00 0.10 0.17 0.00 0.00 0.02 0.11 0.99 ▇▁▁▁▁
instrumentalness 0 1.00 0.28 0.37 0.00 0.00 0.01 0.72 0.99 ▇▁▁▁▂
liveness 0 1.00 0.21 0.18 0.01 0.10 0.14 0.29 0.99 ▇▃▁▁▁
valence 0 1.00 0.36 0.23 0.02 0.16 0.32 0.52 0.99 ▇▇▅▃▁
tempo 0 1.00 147.47 23.84 57.97 129.93 144.97 161.46 220.29 ▁▁▇▃▁
duration_ms 0 1.00 250865.85 102957.71 25600.00 179840.00 224760.00 301133.00 913052.00 ▆▇▂▁▁
time_signature 0 1.00 3.97 0.27 1.00 4.00 4.00 4.00 5.00 ▁▁▁▇▁
unnamed_0 21525 0.49 10483.97 6052.36 0.00 5255.75 10479.50 15709.25 20999.00 ▇▇▇▇▇

Upon initial analysis of the dataset, I noticed that there are no missing values for the variables of interest. The outcome variable that I am examining in response to the audio feature variables is ‘genre’. This variable consists of fifteen unique genres, such as trap, hip-hop, underground rap, pop, etc. After first loading the data, I noticed that the majority of the data and its observations fall under the ‘underground rap’ genre. Here I am demonstrating the distributions for the audio feature variables by each genre. Additionally, I examined some potentially interesting relationships between a few of the variables, such as loudness vs. energy and danceability vs. speechiness, to observe if there are any correlations between variables.

Methods

For this dataset, I will be utilizing the following models: decision trees and k-nearest neighbor. The decision tree model can be used to interpret, visualize, and identify which audio feature variables are most important in determining the genre of a song. K-nearest neighbor models are used to classify a new observation by finding the k observations in the training set that are closest to it and assigning it the class most common among those k neighbors. I will be tuning the minimum number samples ‘min_n’ parameter for the decision tree model and the ‘neighbors’ parameter for the k-nearest neighbor model. The recipe I will be using has ‘genre’ against all of the audio feature variables. I will be using repeated V-fold cross-validation in this situation to reduce the variance in the performance estimate of the model so that the evaluation of the model is more reliable. For this project, I will be utilizing the ROC curve and precision values to select the best model.

Model Building & Selection Results

After tuning both the decision tree and knn models using grid search and cross-validation, I was able to achieve improved performance. Overall, I noticed that the knn model had a higher ROC value than the decision tree model. However, it is important to note that the decision tree model had a higher precision value than the knn model, indicating that it was better at correctly predicting the positive cases (i.e. correctly identifying the genre).

Further tuning could be explored in the future, such as adjusting the number of neighbors for the knn model or exploring different splitting criteria for the decision tree model. Additionally, other models could be explored and compared, such as random forests or support vector machines.

In terms of systematic differences in performance between the model types, I observed that the decision tree model had a higher precision value, indicating that it was better at identifying the positive cases (correctly predicting the genre). On the other hand, the knn model had a higher ROC value, indicating that it was better at overall classification performance.

Based on my analysis and comparison of performance metrics, I select the knn model as our final/winning model. While the decision tree model had a higher precision value, I prioritize overall classification performance, which the knn model excelled at with its higher ROC value. It was not particularly surprising that the knn model won, as it is a commonly used classification algorithm that is known for its performance in many scenarios.

# A tibble: 9 × 7
  min_n .metric   .estimator  mean     n std_err .config             
  <int> <chr>     <chr>      <dbl> <int>   <dbl> <chr>               
1     2 f_meas    macro      0.565    15 0.00311 Preprocessor1_Model1
2     2 precision macro      0.608    15 0.00205 Preprocessor1_Model1
3     2 roc_auc   hand_till  0.839    15 0.00104 Preprocessor1_Model1
4    11 f_meas    macro      0.565    15 0.00311 Preprocessor1_Model2
5    11 precision macro      0.608    15 0.00205 Preprocessor1_Model2
6    11 roc_auc   hand_till  0.839    15 0.00104 Preprocessor1_Model2
7    20 f_meas    macro      0.565    15 0.00311 Preprocessor1_Model3
8    20 precision macro      0.608    15 0.00205 Preprocessor1_Model3
9    20 roc_auc   hand_till  0.839    15 0.00104 Preprocessor1_Model3
# A tibble: 9 × 7
  neighbors .metric   .estimator  mean     n  std_err .config             
      <int> <chr>     <chr>      <dbl> <int>    <dbl> <chr>               
1         1 f_meas    macro      0.481    15 0.00119  Preprocessor1_Model1
2         1 precision macro      0.477    15 0.00116  Preprocessor1_Model1
3         1 roc_auc   hand_till  0.724    15 0.000673 Preprocessor1_Model1
4        10 f_meas    macro      0.498    15 0.000912 Preprocessor1_Model2
5        10 precision macro      0.498    15 0.000960 Preprocessor1_Model2
6        10 roc_auc   hand_till  0.861    15 0.000629 Preprocessor1_Model2
7        20 f_meas    macro      0.502    15 0.00106  Preprocessor1_Model3
8        20 precision macro      0.517    15 0.00152  Preprocessor1_Model3
9        20 roc_auc   hand_till  0.887    15 0.000755 Preprocessor1_Model3
# A tibble: 2 × 3
  model ROC_AUC       se
  <chr>   <dbl>    <dbl>
1 Tree    0.839 0.00104 
2 KNN     0.887 0.000755

Final Model Analysis

The final KNN model was selected based on its high ROC AUC score on the testing data. This model was fit to the testing data and performance was assessed with a confusion matrix. The confusion matrix showed that the model correctly predicted genres according to the ‘truth’.

The outcome variable was not transformed in this analysis.

Overall, the KNN model was the best performing model out of the ones tested, but it is important to note that the effort of building a predictive model should always be considered in relation to the payoff. In this case, the model did not significantly outperform a null model, which would simply predict the most frequent genre (rap) every time.

One potential feature of the KNN model that made it the best was its ability to fit nonlinearity well. However, there is still room for further exploration and tuning of the model in future analyses.

# A tibble: 8,462 × 2
   genre     .pred_class    
   <chr>     <fct>          
 1 Dark Trap Underground Rap
 2 Dark Trap dnb            
 3 Dark Trap Underground Rap
 4 Dark Trap Underground Rap
 5 Dark Trap Underground Rap
 6 Dark Trap Underground Rap
 7 Dark Trap Underground Rap
 8 Dark Trap Underground Rap
 9 Dark Trap Underground Rap
10 Dark Trap Underground Rap
# … with 8,452 more rows
                 Truth
Prediction        Dark Trap  dnb  Emo hardstyle Hiphop  Pop psytrance  Rap  RnB
  Dark Trap             160    9   25        14     23    5         3    8   21
  dnb                    28  314   29         2      6    2         0    0    5
  Emo                     0    0    0         0      0    0         0    0    0
  hardstyle              94  127  151       462     18    8         7    6   15
  Hiphop                  0    0    0         0      0    0         0    0    0
  Pop                     0    0    0         0      0    0         0    0    0
  psytrance              74   85    1        78      0    0       479    1    1
  Rap                     0    0    0         0      0    0         0    0    0
  RnB                     0    0    0         0      0    0         0    0    0
  techhouse              15    4   18         2      8    9        23    9   10
  techno                 71    3    1         1      1    1        60    1    0
  trance                 12    0    2         2      1    0         4    0    0
  trap                   10   26    9        47      3    3         2    2    1
  Trap Metal              0    0    0         0      0    0         0    0    0
  Underground Rap       473   43   94        20    544   74         5  349  410
                 Truth
Prediction        techhouse techno trance trap Trap Metal Underground Rap
  Dark Trap               1      5     26    5         20              23
  dnb                     0      0      0   11         17               5
  Emo                     0      0      0    0          0               0
  hardstyle               7      1    149  132         33              24
  Hiphop                  0      0      0    0          0               0
  Pop                     0      0      0    0          0               0
  psytrance              46    100    258   76          4               5
  Rap                     0      0      0    0          0               0
  RnB                     0      0      0    0          0               0
  techhouse             363     43     12    7          5               6
  techno                 77    428     25    8          7               9
  trance                  0      1     54    2          6               1
  trap                    1      0      5  248         39              12
  Trap Metal              0      0      0    0          0               0
  Underground Rap        67     12     28  121        229            1064

Conclusion

In conclusion, my analysis shows that it is possible to predict music genre based on audio features with a reasonable degree of accuracy. Among the various models and tuning strategies explored, the K-nearest neighbors (KNN) algorithm and Euclidean distance metric performed the best in terms of the ROC AUC metric on the testing set. The final model achieved a high ROC AUC score, indicating that it can distinguish between different genres with high accuracy.

It is worth noting that while the KNN model performed well, there is still room for improvement. One potential avenue for future work could be to explore more advanced machine learning algorithms, such as neural networks or gradient boosting, which may be able to capture more complex relationships between the audio features and genre. Additionally, it may be beneficial to consider other features or data sources, such as lyrics or artist information, to further improve the predictive performance.

Overall, the results of this analysis demonstrate the potential of machine learning to automate the genre classification process in music, which can be useful in various applications such as music recommendation systems, playlist curation, and music indexing.

References

https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotify?resource=download by Andrii Samoshyn